Using Network Analysis to Improve Nearest Neighbor Classification of Non-network Data
نویسندگان
چکیده
The nearest neighbor classifier is a powerful, straightforward, and very popular approach to solving many classification problems. It also enables users to easily incorporate weights of training instances into its model, allowing users to highlight more promising examples. Instance weighting schemes proposed to date were based either on attribute values or external knowledge. In this paper, we propose a new way of weighting instances based on network analysis and centrality measures. Our method relies on transforming the training dataset into a weighted signed network and evaluating the importance of each node using a selected centrality measure. This information is then transferred back to the training dataset in the form of instance weights, which are later used during nearest neighbor classification. We consider four centrality measures appropriate for our problem and empirically evaluate our proposal on 30 popular, publicly available datasets. The results show that the proposed instance weighting enhances the predictive performance of the nearest neighbor algorithm.
منابع مشابه
Identification of selected monogeneans using image processing, artificial neural network and K-nearest neighbor
Abstract Over the last two decades, improvements in developing computational tools made significant contributions to the classification of biological specimens` images to their correspondence species. These days, identification of biological species is much easier for taxonomist and even non-taxonomists due to the development of automated computer techniques and systems. In this study, we d...
متن کاملObject-Based Classification of UltraCamD Imagery for Identification of Tree Species in the Mixed Planted Forest
This study is a contribution to assess the high resolution digital aerial imagery for semi-automatic analysis of tree species identification. To maximize the benefit of such data, the object-based classification was conducted in a mixed forest plantation. Two subsets of an UltraCam D image were geometrically corrected using aero-triangulation method. Some appropriate transformations were perfor...
متن کاملDetecting Diseases in Medical Prescriptions Using Data Mining Tools and Combining Techniques
Data about the prevalence of communicable and non-communicable diseases, as one of the most important categories of epidemiological data, is used for interpreting health status of communities. This study aims to calculate the prevalence of outpatient diseases through the characterization of outpatient prescriptions. The data used in this study is collected from 1412 prescriptions for various ty...
متن کاملDetecting Diseases in Medical Prescriptions Using Data Mining Tools and Combining Techniques
Data about the prevalence of communicable and non-communicable diseases, as one of the most important categories of epidemiological data, is used for interpreting health status of communities. This study aims to calculate the prevalence of outpatient diseases through the characterization of outpatient prescriptions. The data used in this study is collected from 1412 prescriptions for various ty...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کامل